47 research outputs found

    Handwriting Recognition of Historical Documents with few labeled data

    Full text link
    Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated textlines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10% of manually labeled text-line data from a dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multiscale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset. Our system achieved the second best result during the ICDAR2017 competition

    Pre-Processing of Degraded Printed Documents by Non-Local Means and Total Variation

    Get PDF
    We compare in this study two image restoration approaches for the pre-processing of printed documents: namely the Non-local Means filter and a total variation minimization approach. We apply these two ap- proaches to printed document sets from various periods, and we evaluate their effectiveness through character recognition performance using an open source OCR. Our results show that for each document set, one or both pre-processing methods improve character recog- nition accuracy over recognition without preprocessing. Higher accuracies are obtained with Non-local Means when characters have a low level of degradation since they can be restored by similar neighboring parts of non-degraded characters. The Total Variation approach is more effective when characters are highly degraded and can only be restored through modeling instead of using neighboring data

    Effect of Pre-Processing on Binarization

    Get PDF
    The effects of different image pre-processing methods for document image binarization are explored. They are compared on five different binarization methods on images with bleed through and stains as well as on images with uniform background speckle. The binarization method is significant in the binarization accuracy, but the pre-processing also plays a significant role. The Total Variation method of pre-processing shows the best performance over a variety of pre-processing methods

    How major depressive disorder affects the ability to decode multimodal dynamic emotional stimuli

    Get PDF
    Most studies investigating the processing of emotions in depressed patients reported impairments in the decoding of negative emotions. However, these studies adopted static stimuli (mostly stereotypical facial expressions corresponding to basic emotions) which do not reflect the way people experience emotions in everyday life. For this reason, this work proposes to investigate the decoding of emotional expressions in patients affected by Recurrent Major Depressive Disorder (RMDDs) using dynamic audio/video stimuli. RMDDs’ performance is compared with the performance of patients with Adjustment Disorder with Depressed Mood (ADs) and healthy (HCs) subjects. The experiments involve 27 RMDDs (16 with acute depression - RMDD-A, and 11 in a compensation phase - RMDD-C), 16 ADs and 16 HCs. The ability to decode emotional expressions is assessed through an emotion recognition task based on short audio (without video), video (without audio) and audio/video clips. The results show that AD patients are significantly less accurate than HCs in decoding fear, anger, happiness, surprise and sadness. RMDD-As with acute depression are significantly less accurate than HCs in decoding happiness, sadness and surprise. Finally, no significant differences were found between HCs and RMDD-Cs in a compensation phase. The different communication channels and the types of emotion play a significant role in limiting the decoding accuracy

    A Mask-Based Enhancement Method for Historical Documents

    Get PDF
    This paper proposes a novel method for document enhancement. The method is based on the combination of two state-of-the-art filters through the construction of a mask. The mask is applied to a TV (Total Variation) -regularized image where background noise has been reduced. The masked image is then filtered by NLmeans (Non-Local Means) which reduces the noise in the text areas located by the mask. The document images to be enhanced are real historical documents from several periods which include several defects in their background. These defects result from scanning, paper aging and bleed-through. We observe the improvement of this enhancement method through OCR accuracy

    Enhancement of Historical Printed Document Images by Combining Total Variation Regularization and Non-Local Means Filtering

    Get PDF
    This paper proposes a novel method for document enhancement which combines two recent powerful noise-reduction steps. The first step is based on the total variation framework. It flattens background grey-levels and produces an intermediate image where background noise is considerably reduced. This image is used as a mask to produce an image with a cleaner background while keeping character details. The second step is applied to the cleaner image and consists of a filter based on non-local means: character edges are smoothed by searching for similar patch images in pixel neighborhoods. The document images to be enhanced are real historical printed documents from several periods which include several defects in their background and on character edges. These defects result from scanning, paper aging and bleed- through. The proposed method enhances document images by combining the total variation and the non-local means techniques in order to improve OCR recognition. The method is shown to be more powerful than when these techniques are used alone and than other enhancement methods

    Réseaux Bayésiens Dynamiques pour la reconnaissance des caractères imprimés dégradés

    Get PDF
    Le but de ce travail est de présenter une nouvelle approche pour la reconnaissance des caractères imprimés dégradés. Notre approche consiste à construire deux chaînes de Markov cachées [HMMs] à l'aide des réseaux bayésiens dynamiques, nommées HMM vertical et horizontal. Un HMM-vertical (respectivement HMM-horizontal) est un modèle qui prend pour séquence d'entrée les colonnes de pixels du caractère (respectivement les lignes de pixels). Nous couplons ensuite ces chaînes suivant deux modèles de couplage en utilisant les réseaux bayésiens dynamiques. Les résultats expérimentaux montrent que les modèles de couplage augmentent le taux de reconnaissance de 8 % à 10 % relativement au système de reconnaissance utilisant les modèles non couplés

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

    Full text link
    [EN] The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR) has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV) words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs). They show that sub-lexical units outperform word units in terms of Word Error Rate (WER), Character Error Rate (CER) and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs) and Convolutional Recurrent Neural Nets (CRNNs). Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition.Work partially supported by projects READ: Recognition and Enrichment of Archival Documents - 674943 (European Union's H2020) and CoMUN-HaT: Context, Multimodality and User Collaboration in Handwritten Text Processing - TIN2015-70924-C2-1-R (MINECO/FEDER), and a DGA-MRIS (Direction Generale de l'Armement - Mission pour la Recherche et l'Innovation Scientifique) scholarship.Granell, E.; Chammas, E.; Likforman-Sulem, L.; MartĂ­nez-Hinarejos, C.; Mokbel, C.; Cirstea, B. (2018). Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks. Journal of imaging. 4(1). https://doi.org/10.3390/jimaging4010015S154
    corecore